30 research outputs found

    Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents

    Full text link
    Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE-training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches

    Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case

    Full text link
    The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach

    Software Challenges For HL-LHC Data Analysis

    Full text link
    The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHC's physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations

    ROOT for the HL-LHC: data format

    Full text link
    This document discusses the state, roadmap, and risks of the foundational components of ROOT with respect to the experiments at the HL-LHC (Run 4 and beyond). As foundational components, the document considers in particular the ROOT input/output (I/O) subsystem. The current HEP I/O is based on the TFile container file format and the TTree binary event data format. The work going into the new RNTuple event data format aims at superseding TTree, to make RNTuple the production ROOT event data I/O that meets the requirements of Run 4 and beyond

    Gaia Early Data Release 3: Structure and properties of the Magellanic Clouds

    Get PDF
    We compare the Gaia DR2 and Gaia EDR3 performances in the study of the Magellanic Clouds and show the clear improvements in precision and accuracy in the new release. We also show that the systematics still present in the data make the determination of the 3D geometry of the LMC a difficult endeavour; this is at the very limit of the usefulness of the Gaia EDR3 astrometry, but it may become feasible with the use of additional external data. We derive radial and tangential velocity maps and global profiles for the LMC for the several subsamples we defined. To our knowledge, this is the first time that the two planar components of the ordered and random motions are derived for multiple stellar evolutionary phases in a galactic disc outside the Milky Way, showing the differences between younger and older phases. We also analyse the spatial structure and motions in the central region, the bar, and the disc, providing new insights into features and kinematics. Finally, we show that the Gaia EDR3 data allows clearly resolving the Magellanic Bridge, and we trace the density and velocity flow of the stars from the SMC towards the LMC not only globally, but also separately for young and evolved populations. This allows us to confirm an evolved population in the Bridge that is slightly shift from the younger population. Additionally, we were able to study the outskirts of both Magellanic Clouds, in which we detected some well-known features and indications of new ones

    The Gaia mission

    Get PDF
    Gaia is a cornerstone mission in the science programme of the EuropeanSpace Agency (ESA). The spacecraft construction was approved in 2006, following a study in which the original interferometric concept was changed to a direct-imaging approach. Both the spacecraft and the payload were built by European industry. The involvement of the scientific community focusses on data processing for which the international Gaia Data Processing and Analysis Consortium (DPAC) was selected in 2007. Gaia was launched on 19 December 2013 and arrived at its operating point, the second Lagrange point of the Sun-Earth-Moon system, a few weeks later. The commissioning of the spacecraft and payload was completed on 19 July 2014. The nominal five-year mission started with four weeks of special, ecliptic-pole scanning and subsequently transferred into full-sky scanning mode. We recall the scientific goals of Gaia and give a description of the as-built spacecraft that is currently (mid-2016) being operated to achieve these goals. We pay special attention to the payload module, the performance of which is closely related to the scientific performance of the mission. We provide a summary of the commissioning activities and findings, followed by a description of the routine operational mode. We summarise scientific performance estimates on the basis of in-orbit operations. Several intermediate Gaia data releases are planned and the data can be retrieved from the Gaia Archive, which is available through the Gaia home page. http://www.cosmos.esa.int/gai

    Scalable Unsupervised Learning for Deep Discrete Generative Models

    No full text
    Efficient, scalable training of probabilistic generative models is a highly sought after goal in the field of machine learning. One core challenge is that maximum likelihood optimization of generative parameters is computationally intractable for all but a few mostly elementary models. Variational approximations of the Expectation-Maximization (EM) algorithm offer a generic, powerful framework to derive training algorithms as a function of the chosen form of variational distributions. Also, usage of discrete latent variables in such generative models is considered important to capture the generative process of real-world data, which, for instance, has motivated research on Variational Autoencoders (VAEs) with discrete latents. Here we make use of truncated posteriors as variational distributions and show how the resulting variational approximation of the EM algorithm can be used to establish a close link between evolutionary algorithms (EAs) and training of probabilistic generative models with binary latent variables. We obtain training algorithms that effectively improve the tractable likelihood lower bound of truncated posteriors. After verification of the applicability and scalability of this novel EA-based training on shallow models, we demonstrate how the technique can be mixed with standard optimization of a deep generative model's parameters using auto-differentiation tools and backpropagation, in order to train discrete-latent VAEs. Our approach significantly diverts from standard VAE training and sidesteps some of its standard features such as sampling approximation, reparameterization trick and amortization. For quantitative comparison with other approaches, we used a common image denoising benchmark. In contrast to supervised neural networks, VAEs can denoise a single image without previous training on clean data or on large image datasets. While using a relatively elementary network architecture, we find our model to be competitive with the state of the art in this ``zero-shot'' setting. A review of the open-source software framework developed for training of discrete-latent generative models with truncated posterior approximations is also provided. Our results suggest that EA-based training of discrete-latent VAEs can represent a well-performing, flexible, scalable and arguably more direct training scheme than alternatives proposed previously, opening the door to a large number of possible future research directions

    openlab summer students' lightning talks 1

    No full text

    Using RDataFrame, ROOT’s declarative analysis tool, in a CMS physics study

    No full text
    With the expected large increase in the amount of available data in LHC Run 3, now more than ever HEP scientists must be able to efficiently write robust, performant analysis software that can take full advantage of the underlying hardware. Multicore computing resources are commonplace, and current trends in scientific computing include increased availability of manycore architectures. The HEP community is not alone in this challenge: the data science industry developed solutions that we can learn from and adapt to HEP-specific problems. This is the context in which the ROOT team (and here especially Enrico) developed RDataFrame, a swiss-army knife for data manipulation that provides a high-level interface, in C++ and Python, as well as transparent optimizations such as multi-thread data parallelism. This new tool supports typical HEP workflows and data formats and it has been designed to flexibly scale up from data exploration on a laptop to analysis of millions of events exploiting hundreds of CPU cores. As a result, ROOT users can now write simpler code that runs faster. The first part of the seminar will introduce RDF, showcase its most prominent features, outline current developments and several real-world use-cases. Precision measurements are often affected by large systematic uncertainties related to the models used in simulation, and progress can be made by the extraction of features directly from data. However, the analysis of unprecedented numbers of events in a sustainable scale of time is not possible with standard techniques. The possibilities of using the ROOT RDataFrame to overcome these limitations is demonstrated within the setup of a CMS physics study in the second part of this seminar.</p

    As marcas lingüísticas de argumentação do texto publicitário nas aulas de leitura de português para estrangeiros.

    No full text
    Experiments at the Large Hadron Collider (LHC) produce tens of petabytes of new data in ROOT format per year that need to be processed and analysed. In the next decade, following the planned upgrades of the LHC and its detectors, this data production rate is expected to increase at least ten-fold. Therefore, optimizing the ROOT I/O subsystem is of critical importance to the success of the LHC physics programme. This contribution presents ROOT’s approach for writing data from multiple threads to a single output file in an efficient way. Technical aspects of the implementation—the TBufferMerger class—and programming model examples are described. Measurements of runtime performance and the overall improvement relative to the case of serial data writing are also discussed
    corecore